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Abstract 

A new variant of bit interleaved coded modulation (BICM) is proposed. In the new scheme, called parallel BICM, L identical 
binary codes are used in parallel using a mapper, a newly proposed finite-length interleaver and a binary dither signal. As opposed 
to previous approaches, the scheme does not rely on any assumptions of an ideal, infinite-length interleaver. Over a memoryless 
channel, the new scheme is proven to be equivalent to a binary memoryless channel. Therefore the scheme enables one to easily 
design coded modulation schemes using a simple binary code that was designed for that binary channel. The overall performance 
of the coded modulation scheme is analytically evaluated based on the performance of the binary code over the binary channel, 
^ry The new scheme is analyzed from an information theoretic viewpoint, where the capacity, error exponent and channel dispersion 

are considered. The capacity of the scheme is identical to the BICM capacity. The error exponent of the scheme is numerically 
^a ' compared to a recently proposed mismatched-decoding exponent analysis of BICM. 

I. Introduction 

Bit interleaved coded modulation (BICM) is a pragmatic approach for coded modulation [1|. It enables the construction 
of nonbinary communication schemes from binary codes by using a long bit interleaver that separates the coding and the 
modulation. BICM has drawn much attention in recent years, because of its efficiency for wireless and fading channels. 

The information-theoretic properties of BICM were first studied by Caire et. al. in 0. BICM was modeled as a binary 

channel with a random state that is known at the receiver. The state determines how the input bit is mapped to the channel, 

*, along with the other bits that are assumed to be random. Under the assumption of an infinite-length, ideal interleaver, the 

BICM scheme is modeled by parallel uses of independent instances of this binary channel. This model is referred to as the 

O^l ■ independent parallel channel model. 

Using this model the capacity of the BICM scheme could be calculated. It was further shown that BICM suffers from a gap 
L». ' from the full channel capacity, and that when Gray mapping is used this gap is generally small. In (2), methods for evaluating 
the error probability of BICM were proposed, which rely on the properties of the specific binary codes that were used (e.g. 
f— i ■ Hamming weight of error events). 

I^| A basic information-theoretic quantity other than the channel capacity is the error exponent 0, which quantifies the speed 

f ■j ' at which the error probability decreases to zero with the block length n. Another tool for evaluating the performance at finite 
f^**) [ block length is the channel dispersion, which was presented in 1962 [4] and was given more attention only in recent years 
0, 0. It would therefore be interesting to analyze BICM at finite block length from the information-theoretic viewpoint. 
Several attempts have been made to provide error exponent results for BICM. 

In their work on multilevel codes, Wachsmann et. al. [7| have considered the random coding error exponent of BICM, by 
rS relying on the independent parallel channels model. However, there were several flaws in the derivation: 

• The independent parallel channels model is justified by an infinite-length interleaver. Therefore it might be problematic 
to use its properties for evaluating the finite length performance of the BICM scheme. In the current paper we address 
this point and propose a scheme with a finite-length interleaver for that purpose. 

• There was a technical flaw in the derivation, which resulted in an inaccurate expression for the random coding error 
exponent. We discuss this point in detail in Theorem @] 

• As noticed in [8|, the error exponent result obtained in sometimes may even exceed that of unconstrained coding 
over the channel (called in the "coded modulation exponent"). We therefore agree with in the claim that "the 
independent parallel channel model fails to capture the statistics of the channel". However, by properly designing the 
communication scheme the model can become valid in a rigorous way, as we show in Theorem Q] 

In (see also [9|), Martinez et al. have considered the BICM decoder as a mismatched decoder, which has access only to 
the log-likelihood values (LLR) of each bit, where the LLR calculation assumes that the other bits are random, independent and 
equiprobable (as in the classical BICM scheme [2]). Using results from mismatched decoding, they presented the generalized 
error exponent and the generalized mutual information, and pinpointed the loss of BICM that incurs from using the mismatched 
LLRs. Note that when a binary code of length n is used, the scheme requires only n/L channel uses. While this result is 
valid for any block size and any interleaver length, achieving this error exponent in practice requires complex code design. 
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For example, one cannot design a good binary code for a binary memoryless channel and have any guarantee that the BICM 
scheme will perform well with that code. In fact, the code design for this scheme requires taking into account the memory 
within the levels, or equivalently, nonbinary codes, which is what we wish to avoid when choosing BICM. 

On the theoretical side, another drawback of existing approaches is the lack of converse results (for either capacity or 
error exponent). The initial discussion of BICM information theory in [2] assumes the model of independent channels, and 
any converse result based on this model must assume that an infinite, ideal, interleaves Therefore the converse results (such 
as upper bound on the achievable rate with BICM) do not hold for finite-length interleaves. The authors in [8 1 provide no 
converse results for their model. 

In this paper we propose the parallel BICM (PBICM) scheme, which has the following properties. First, the scheme includes 
an explicit, finite length interleaver. Second, in order to attain good performance on any memoryless channel, PBICM allows 
one to design a binary code for a binary memoryless channel, and guarantees good performance on the nonbinary channel. 
Third, because the scheme does not rely on the use of an infinite-length interleaver, the error exponent and the dispersion of 
the scheme can be calculated (both achievability and converse results) as means to evaluate the PBICM performance at finite 
block length. 

The comparison between PBICM and the mismatched decoding approach fl8] should be done with care. With PBICM, when 
the binary codeword length is n the scheme requires n channel uses. Therefore when the latency kept equal for both schemes, 
PBICM uses a codeword length that is L times shorter than the codeword used in the mismatched decoder. A fair comparison 
would be to fix the binary codeword length n for both schemes, resulting in different latency, but equal decoder complexity. 

The results presented in the paper are summarized as follows: 

• The PBICM communication framework is presented. Over a memoryless channel, it is shown to be equivalent to a binary 
memoryless channel (Theorem Q~|i. 

• In Theorem [2] the capacity of PBICM is shown to be equal to the BICM capacity, as calculated in (2). 

• PBICM is analyzed at finite block length. The error exponent of PBICM is defined and bounded by error exponent bounds 
of the underlying binary channel (Theorems [3] and |4). 

• The PBICM dispersion is defined as an alternative measure for finite-length performance. It is calculated by the dispersion 
of the underlying binary channel (Theorems [5] and |6). 

• The error exponent of PBICM is numerically compared to the mismatched-decoding error exponent of BICM [8|. The 
additive white Gaussian noise (AWGN) channel and the Rayleigh fading channel are considered. When the latency of 
both schemes is equal, the mismatched-decoding is generally better. However, when the complexity is equal (or where 
the codeword length of the underlying binary code is equal), the PBICM exponent is better in many cases. 

The paper is organized as follows. 

In Section [II] we review the classical BICM model and its properties, under the assumption of an infinite-length, ideal 
interleaver. In Section [Til] the parallel BICM scheme is presented and the equivalence to a memoryless binary channel is 
established. In Section [TV] parallel BICM is studied from an information-theoretical viewpoint. Numerical examples and 
summary follow in Sections M and [VTI respectively. 

II. The BICM Communication Model 

Notation: letters in bold (x, y...) denote row vectors, capital letters (X,Y...) denote random variables, and tilde denotes 
interleaved signals (b, z). Px(x) denotes the probability that the random variable (RV) X will get the value x, and similarly 
Py\x{v\ x ) denotes the probability Y will get the value y given that the RV X is equal to x. E[-] denotes statistical expectation, 
log means log 2 . 

A. Channel model 

Let W denote a memoryless channel with input and output alphabets X and y respectively. The transition probabilities are 
defined by W(j/|a;) for y £ y and x £ X. We assume that \\X\\ — 2 L . We consider equiprobable signaling only over the 
channel W. 

An (n, R) code C C X n is a set of M = 2 nR codewords c <G X n . The encoder wishes to convey one of M equiprobable 
messages. The error probability of interest shall be the codeword error probability. An (n, R) code with codeword error 
probability p e will sometimes be called an (n,R,p e ) code. 

B. Classical BICM encoding and decoding 

In BICM, a binary code is used to encode information messages [mi, ni2, ■■■] into binary codewords [bi, D2, ...]. The binary 
codewords are then interleaved using a long interleaver tt(-), which applies a permutation on the coded bits. The interleaved 
bit stream b is partitioned into groups of L consecutive bits and inserted into a mapper p : {0, 1} L — > X. The mapper output, 
denoted x, is fed into the channel. The encoding process is described in Figure Q] 
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Fig. 1. BICM encoding process 
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Fig. 2. BICM decoding process 



The decoding process of BICM proceeds as follows. The channel output y is fed into a bit metric calculator, which calculates 
the log-likelihood ratio (LLR) of each input bit b given the corresponding output sample y (L LLR values for each output 
sample). These LLR values (or bit metrics) denoted z are de-interleaved and partitioned into bit metrics [zi,Z2,...] that 
correspond to the binary input codewords. Finally, the binary decoder decodes the messages [mi , to 2 , ...] from [zi, z 2 , ...]. The 
decoding process is described in Figure 

The LLR of the j th bit in a symbol given the output value y is calculated as follows: 
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where Py\B \y\b) is the conditional probability of the channel output getting the value y given that the j th bit at the mapper 
input was b, and the other (L — 1) bits are equiprobable independent binary random variables (RVs). 

C. Classical BICM analysis: ideal interleaving 

In classical BICM (e.g. (3) the LLR calculation is motivated by the assumption of a very long (ideal) interleaver ir, so the 
coded bits go through essentially independent channels. These binary channels are defined as follows: 
Definition 1: Let Wi be a binary channel with transition probability 
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The channel Wi(y\bi) can be thought of as the original channel W where the input is x = /z(6i ...&j,), where the bits {bj} 
are equiprobable independent RVs (see Fig. 0. 

In 12, Caire et al. have proposed the following channel model for BICM called the independent parallel channel model. In 
this model the channel has a binary input b. A channel state s is selected at random from S = {1, ..., L} with equal probability 
(and independently of b). Given a state s, the input bit b is fed into the channel W s . The channel outputs are the state s and 
the output y of the channel W 8 . The channel, denoted by W, is depicted in Figure H] 

The transition probability function of W is given by 
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Fig. 3. The binary channel Wi. The bits {B^j^i are equiprobable independent RVs. 
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Fig. 4. The binary channel W. The random state S is known at the receiver. 



Note that both outputs can be combined into a single output, the LLR, which is a sufficient statistic for optimal decoding 
over any binary-input channel. The LLR calculation for the channel W is given by 

LLR w (y,s) = LLR s (y), (5) 

where LLR S is given in ([T). 

Therefore the independent parallel channel model transforms the original nonbinary channel W to a simple, memoryless 
channel. Using an infinite-length interleaver and a binary code that was designed for the simple binary channel W, reliable 
communication for the original channel W can be attained. 

Let C() denote the Shannon capacity of a channel (with equiprobable input). 

Lemma 1 (following 12]): Let C BlCM (W) denote the capacity of the channel W with BICM, a given mapping /x(-) and an 
infinite-length interleaver (according to the independent parallel channel model). Denote by C(W S ) the capacity of the channel 
W s . Then 

L 

C mCM (W) =^C{W S ). (6) 

8=1 

Proof: Since the independent parallel channel model assumes L independent uses of the channel W, we get that 
C BICM (W) = L ■ C(W). The capacity of W is given by 

C(W) - I(B;Y,S) = I(B-Y\S) 

= E 3 I(B;Y\S = s)=E s C(W s ) 

s=l 

■ 

It is known that C B1CM (W) is generally smaller than the full channel capacity C(W), as opposed to other schemes, most 
notably multilevel coding and multistage decoding (MLC-MSD) 0, in which C(W) can be achieved. However, for Gray 
mapping the gap is small and can sometimes be tolerated. For example, for 8-PSK signaling over the AWGN channel with 
SNR = 5dB, C(W) = 1Mbit where C BICM (iy) = 1Mbit 

III. The Parallel BICM Scheme 

In this section we propose an explicit BICM-type communication scheme which we call parallel BICM (PBICM), which 
allows the usage of binary codes on nonbinary channels at finite blocklength. The main features of the scheme include the 
following: 

• Binary codewords are used in parallel to construct a codeword that enters the channel. 

• A new finite-length interleaver. 

• A random binary signal (binary dither) that is added to the binary codewords. 

With the proposed scheme, we rigorously show how the original channel W relates to the channel W, thus allowing exact 
analysis and design of codes at finite block lengths. 

A. Interleaver Design 

We wish to design a finite length interleaver, where: 

• The length of the interleaver is minimal, 

• The interleaver should be as simple as possible, 

• The binary codewords will go through a binary memoryless channel. 
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Fig. 5. Interleaving scheme viewed as parallel encoders 
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Fig. 6. De-interleaving scheme viewed as parallel decoders 



In order for the binary codewords to experience a memoryless channel, each binary codeword must be spread over n channel 
uses of W, so the interleaver output length cannot be less than n channel uses. The newly proposed interleaver has of output 
length of exactly n, which satisfies the above requirements. 

Let ENC and DEC be an encoder-decoder pair for a binary code. Let bi, ..., b^ be L consecutive codewords from the 
output of ENC, bunched together to a matrix B: 

bi \ / 6n 
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Let s be a vector of i.i.d. random states drawn from S n = {l..L} n . s shall be the interleaving signal. Each column in B 
shall be shifted cyclically by the corresponding element Sfe, so the interleaved signal B is defined as 
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where (£)l — (£, modulo L) + 1. 

Each column vector of interleaved signal B is mapped to a single channel symbol: 

Xk = V(b(l+s k ) L k, ■■■, b( L +s k ) L k), 



(9) 



and we call x = [xi, ■•■,X n ] the channel codeword. 

At the decoder an LLR value is calculated for every bit b in B from y. The LLR values are denoted by Z. We assume that 
s is known at the decoder (utilizing common randomness), therefore the de-interleaving operation is simply sorting back the 
columns of Z according to s by reversing the modulo operation. The de-interleaver output is a vector of LLR values z for 
each transmitted codeword b, according to ((T). Each codeword is decoded independently by DEC. 

B. Binary dither 

Since the decoder decodes each binary codeword independently, the communication scheme employing the above interleaver 
can be viewed as as set of parallel encoder-decoder pairs, which we denote by ENC\, ..., ENCl and DECi, ..., DECl (see 
Figures [5] and |6). We do not assume any independence between the effective channels between each encoder-decoder pair. 

Consider the first encoder-decoder pair, ENC\ and DECi. Since the input of DECi depends on the codewords trans- 
mitted by ENC2,-..,ENCl, the channel between ENCi and DECi is not strictly memoryless. If, somehow, the decoders 
DEC2,-..,DECl were forced to send i.i.d. equiprobable binary codewords, then the channel between ENCi and DECi would 
be exactly the channel W (which is a binary memoryless channel) with the accurate LLR calculation (Q3. 

In order to achieve the goal of L binary memoryless channels between each encoder-decoder pair simultaneously, we add 
a binary dither - an i.i.d. equiprobable binary signal - to each encoder-decoder pair as follows. 

Let the dither signals d; = [da, ..., di n ], I £ {1, ..., L} be L random vectors, each of length n, that are drawn independently 
from a memoryless equiprobable binary source. The output of each encoder ENCi, b;, goes through a component- wise XOR 
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Fig. 7. PBICM encoding scheme. '+' denoted modulo-2 addition (XOR). 
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Fig. 8. PBICM decoding scheme. 5; = 1 — 2 • d;, 'x' denotes element-wise multiplication. 



operation with the dither vector d;. The output of the XOR operation, denoted bj, is fed into the interleaver n. The full PBICM 
encoding scheme is shown in Fig. [7] 

We let each decoder DECi know the value of the dither used by its corresponding encoder ENCi, d; (in practice the dither 
signals are generated using a pseudo-random generator which allows the common randomness). In order to compensate for 
the dither at the decoder, the LLR values are modified by flipping their sign for each dither value of 1 (and maintaining the 
sign where the dither is 0). Formally, denote the LLR values at the de-interleaver output by z[ =- \zU ... zLl. The LLR values 
at the decoders input shall be denoted by z; = [zn ... zi n ] and calculated as follows: 
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The PBICM decoding scheme is shown in Fig. [8] 



3 = l,-,n. 



(10) 



C. Model equivalence 

Before we analyze the channel between each encoder-decoder pair in PBICM, let us define a binary memoryless channel 
that is related to W, that will prove useful in the analysis of PBICM. 

Definition 2: Let W be a memoryless binary channel with input B and output (Y,S,D): S is drawn at random from 
{1, ..., L}, D is drawn at random from {0, 1} (S and D are independent, and both do not depend on the input E). Y is the 
output of the channel Ws with input B D (0 is the XOR operation). Note that the channel W is the channel W where the 
input is XORed with a binary RV D (see Fig. [9). 
Note that the LLR calculation for the channel W is given by 



LLR w (y lS ,d) - (-l) d LLR w (y,s) = (-l) d LLR s (y), 



(11) 



where LLRt^ and LLR S are given in @ and (Q]), respectively. 

Theorem 1: In parallel BICM, the channel between every encoder-decoder pair is exactly the binary memoryless channel 
W, with its exact LLR output. 

Proof: Consider the pair ENC\ and DEC\. Let bi be the codeword sent from ENC\. After adding the dither di, the 
dithered codeword b^ enters the interleaver. The other codewords b2,...,bi are dithered using d2,...,di. Since the dither 
of these codewords is unknown at DECi, the dithered codewords b 2 , ...,b' L are truly random i.i.d. signals. The interleaving 
signal s interleaves the dithered codewords according to ®. The interleaved signal enters the mapper /1 and the channel W, 







1- 


\ 


1 

£e{0,i} 


W Si 


= {1,..£} 






I 


w 


R 




Ws 1 


— ► 




*w ^ 



D 

S 

Y 



Fig. 9. The binary channel W. The random state S and the dither D are known at the receiver. 



resulting in an output y. Since the dithered codewords h' 2 , ..., b' L are i.i.d., the equivalent channel from h[ to (y, s) is exactly 
the channel W, The LLR calculation at the PBICM receiver along with the interleaver produce z[ , which is exactly the LLR 
calculation that fits the channel W (cf. (|5]l). 

Recalling that the channel W is nothing but the channel W with its input XORed with a binary RV, and that the LLR of 
the channel W can be easily modified by the dither according to Eq. dTTb to produce the LLR of the channel W, we conclude 
that the channel between bi to zi is exactly the channel W with LLR calculation. 

Since by symmetry the above holds for any encoder-decoder pair ENCi-DECi, the proof is concluded. ■ 

An important note should be made: Parallel BICM allows the decomposition of the nonbinary channel W to L binary 
channels of the type W. These L channels are not independent. For example, if W is an additive noise channel, and at some 
point the noise instance is very strong, this will affect all the decoders and they will fail in decoding together. However, 
since in the PBICM scheme the channels are used independently, the operation of each decoder depends only on the marginal 
distribution of the relevant channel outputs. The outputs of these decoders will inevitably be statistically dependent, and we 
take this into consideration when analyzing the performance of coding using PBICM in the following. 

D. Error Probability Analysis 

We wish to analyze the performance of PBICM, and specifically, we are interested in the overall codeword error probability. 
Let C be a binary (n, R) code, used in the PBICM scheme. To assure a fair comparison, we regard each L consecutive 
information messages (mi, ..., mx) as a single message m, and regard the scheme as a code of length n on the channel input 
alphabet X. We define the following error events: Let Si be the event of a codeword error in DEC\, and let £ be the event 
of an error in any of the messages {mi, ...,mx}, i.e. £ = (Jz &i- Denote the corresponding error probabilities by p ei and p e 
respectively. 

Corollary 1: Let p e (W) be the codeword error probability of the code C over the channel W. Then the overall error 
probability p e of the code C used with PBICM can be bounded by 

Pe(W)< Pe <L-p e {W). (12) 

Proof: Since the error events E\ in codewords that are mapped to the same channel codeword together are dependent, we 
can only bound the overall error probability p e using the union bound. p e can be also lower bounded by the minimum of the 
error probabilities in any of the channels: 

min{p ei ,...,p eL }<p e < 2_,p ev (13) 

l 

Since by Theorem Q] the channel between each of the encoder-decoder pairs is W, we get that the error probabilities must be 
all equal to the error probability of the code C over the channel W. Setting p ei — p e2 = ... = p eL = p e (W) in ( fT3l completes 
the proof. ■ 

In many cases the bit error rate (BER) is of interest. Suppose that each of the messages (mi , ..., mj,) represents k information 
bits and the entire message m represents L ■ k information bits. Let £f k , denote the error in the fe'-th bit of the information 
message mi. The average BER for the encoder-decoder pair ENCi-DECi is defined by 
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Similarly, define the overall average BER as 
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Corollary 2: Let p b e {W) be the average BER of a binary code C over the channel W. Then the average BER p\ of the code 
C used with PBICM is equal to p b e (W). 

Proof: Follows directly from Theorem [T] and from the definition of the average BER in (1151 . ■ 

IV. Parallel BICM: Information Theoretical Analysis 

In the previous section we defined the PBICM scheme and analyzed its basic error probability properties. The equivalence 
of the channel between each encoder-decoder pair that was established in Theorem Q] enables a full information-theoretical 
analysis of the scheme. We show that the highest achievable rate by PBICM (the PBICM capacity) is equal to the BICM 
capacity as in Equation (0, which should not be a surprise. At the finite-length regime, we derive error exponent and channel 
dispersion results as information-theoretical measures for optimal PBICM performance at finite-length. 



A. Capacity 

Let the PBICM capacity of W, C PBICM (W), be the highest achievable rate for reliable communication over the channel W 
with PBICM and a given mapping [i. (As usual, reliable communication means a vanishing codeword error probability as the 
codelength n goes to infinity.) 

Theorem 2: The PBICM capacity is given by 

L 

C PBICM (W0 = L ■ C(W) = J^ C{W S ) = C BICM (W). (16) 

s=l 

Proof: 

Achievability: Let C^ be a series of (binary) capacity-achieving codes for the channel W, and let pj 1 (W) be the 
corresponding (vanishing) codeword error probabilities. By Corollary Q] the overall error probability of PBICM with a binary 
code is upper bounded by L times the error probability of the same code over the channel W, therefore when the codes C^ 
are used with PBICM, the overall error probability is bounded by L ■ pj 1 (W) and also vanish with n. Since there are L 
instances of the channel W, we get that the rate of L ■ C(W) is achievable by PBICM. 

Converse: Let C^ n > be a series of binary codes that are used with PBICM and achieve a vanishing overall error probability 

(n) 

Pe ', and suppose that the overall PBICM rate is given by L ■ R (a rate of R at each encoder-decoder pair). By Corollary 
Q] the codeword error probability of a code over W is upper bounded by the overall error probability of the same code used 
in PBICM. Therefore, if pj 1 vanishes asn-> oo, then the error probability over W must also vanish, and therefore the 
communication rate between each encoder-decoder pair must be upper bounded by C(W), and the overall rate cannot surpass 
L-C(W). _ 

All that remains is to calculate the capacity of W: 

C(W) = I(B; Y, S, D) = I(B; Y, S\D) = 1 (I(B; Y, S\D = 0) + I(B; Y, S\D = 1)) . (17) 

When D = 0, we get the channel W exactly, and when D — 1 we get the channel W with its input symbols always switched. 
In either way, the expression I(B\ Y, S\D = d) is equal to the capacity of W. Using Lemma Q] we get that 

_ ~ \ L 

C(W) = C(W) = - J2 C TO- ( 18 > 

8 = 1 

■ 

A note regarding the capacity proof: one might me tempted to try and prove the capacity theorem for PBICM without 
dither, since with random coding, the code C is merely an i.i.d. binary random vector. This approach fails because of the 
following. In the decoding of each codeword, the correctness of the model W relies on the fact that the other codewords are 
i.i.d. signals. Since PBICM requires a single code for all the L levels, such a condition can never be met. It it possible to prove 
the achievability without dither when using a different random code at each level, but such an approach will not guarantee the 
existence of a single code, as required by PBICM. 

B. Error Exponent 

The error exponent of a channel W is defined by 

E(iZ)4 lim --\og(p e (n)) , (19) 

n— foo fi 

where p e (n) is the average codeword error probability for the best code of length n. A lower bound on the error exponent for 
memoryless channels is the random coding error exponent [3|, which is given by 
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Since we consider equiprobable inputs only we omit the dependence of ~Eo(p) in Px, and omit the maximization w.r.t. Px in 
d20). 

Others known bounds on the error exponent include the expurgation error exponent lower bound, the sphere packing error 
exponent (an upper bound) and others [3|. The random coding and sphere packing exponents coincide for rates above the 
critical rate, and therefore the error exponent is known precisely at these rates. 



1) PBICM error exponent: 

Similarly to ( fT9b , we define the PBICM error exponent: 

Definition 3: For a given channel W and a mapping /i, let E PBICM (_R) be defined as 

E PBICM (i?) A Hm _I bg {pe{n)) } (22) 

n— foo 77, 

where p e (n) is the average codeword error probability for the best PBICM scheme with block length of n. 
Using Corollary Q] we can calculate the PBICM exponent using the error exponent of W: 
Theorem 3: The PBICM error exponent of a channel W is given by 

E PBICM (.R)=E(.R/L), (23) 

where E(-) is the error exponent function of the binary channel W. 

Proof: Let C' n ) be a series of the binary codes. Denote their codeword error probabilities over the channel W by pj (W). 
Let p ( e n ' be the error probabilities of the corresponding PBICM schemes with C^ used as underlying codes. It follows from 
( fT2l that 

- - log(i ■ pi n) {W)) <-- logp(") <-- logpW {W). (24) 

n n n 

By taking n — > oo the factor of L vanishes and we get that for any series of codes, 

lim --logpW = lim --\ogpi n \W). (25) 

The above equation holds for the series of best codes for the channel W, as well as for the series of the best codes for PBICM. 
Therefore the equality holds for the sequence of best codes on either side. Since the rate for PBICM is L times the rate for 
coding on W, the proof is concluded. ■ 

2) The error exponent of W: 

The channel W has a special structure, and is related to the binary sub-channels Wi. We now calculate two basic bounds 
for the error exponent of W in terms of the sub-channels Wi. By Theorem [3] the PBICM error exponent of the channel W 
can be bounded accordingly. 

Theorem 4: Let E(i?) be the error exponent of the channel W. It can be bounded as follows: 

Random coding: 

B(R) > EJR) = max {E (o) - pR}, (26) 

P6[0,l] 



where 

Eo( / 3)=-logE[2- E o S> (") 



(27) 



Eq (p) is the Eo function for the channel W s , and the expectation is w.r.t. the state 5* which is drawn uniformly from {1..L}. 



Sphere packing: 



E(i?) < E sp (R) = maxjEo^) - P R}, (28) 

p>0 



where Eo(p) is given in (I27t . 
Proof: 

The bounds in the theorem are the original random coding and sphere packing exponents [3|. The proof, therefore, boils 
down to the simplification of the Eo function to the form of (|27| >. 

Consider the channel W (Definition |2|i with binary input B and outputs (Y, S, D). Since W is equivalent to the channel W 
with input B D, where D is an equiprobable binary RV (and known at the receiver), we get that 

W(y,s,d\b) = ^W(y,s\b®d). (29) 

The channel W, in turn, is nothing more than the channel W s with the additional output S. This yields 

\w{y,s\b®d) = ^W s {y\b®d). (30) 



Combining the above, the function Eq of W is therefore given by 



Eo(p) = -log J2 



(a) 



W 



s£{l..L} 

<1£{0,1} 



^ E 



se{i..L} 



J2 P B (b)W{y,s,d\b)^l 
be{o,i} 



E \{^fW s {y\h®d) 



i+p 



'= J -log E 



se{l..L} 



&e{o,i} 



E -f^w: 



i+p 



6'e{o,i} 



log E ?E E ^(#0^ 



L ^ ^2 

«e{i..£} yey|_6'e{o,i} 

log E z 2 "" ,(p) 

se{i..i} 



i+p 



i+p 



(31) 



• lo g E[: 



2 - E o'(p) 



(32) 



(a) follows by setting b 1 = b@d, and by noting that the summation result is independent of the value of d. (b) follows from 



>(*) 



the definition of Eq (/?) (the Eo function for the channel W s ). 



Several notes can be made: 

• It is well known that the random coding and sphere packing exponents coincide at rates above the critical rate. Therefore 
the exact error exponent of W is known at rates above the critical rate of W, R^. It follows that the exact PBICM error 
exponent is known at rates above i?™ CM = L ■ R^, which we define to be the PBICM critical rate. 

• In theorem |4]we have shown that the random coding and the sphere packing bounds have a compact form because of the 
special structure of the channel W. Clearly, following Theorem [3] every bound on E(i?) of W serves as a bound on the 
PBICM error exponent. However, for other bounds (such as the expurgation error exponent El), no compact form could 
be found. Such bounds, of course, can still be applied to bound E PBICM (i?). 

• The Eo function of the channel W is equal to the Eo function of the channel W. This can easily be seen from the proof 
above: E for W is given in ( BIT ) by definition. 

• In [7], the authors offered the model of W for calculating the error exponent of BICM. It is claimed that Eo of the 
channel W is given by [7, Eq. (37)]: 

i L 

U-^„ (33) 



E 



E %) 



8=1 



4 S) ' 



^'( P ) 



As we have just shown in Theorem^ this is not the exact expression. In fact, it can be shown that E (p) < 

This follows directly from the convexity of the function 2~( ' and the Jensen inequality. Therefore the incorrect expression 
in ||7] Eq. (37)] always overestimates the value of Eo(p), and therefore the resulting E,(i?) expression also overestimates 
the true random coding expression. 

C. Channel Dispersion 

An alternative information theoretical measure for quantifying coding performance with finite block lengths is the channel 
dispersion. Suppose that a fixed codeword error probability p e and a codeword length n are given. We can then seek the 
maximal achievable rate R given p e and n. 

It appears that for fixed p e and n, the gap to the channel capacity is approximately proportional to Q~ 1 (pe)/y/n (where 
Q(-) is the complementary Gaussian cumulative distribution function). The proportion constant (squared) is called the channel 
dispersion. Formally, define the (operational) channel dispersion as follows [6]: 

Definition 4: The dispersion V(W) of a channel W with capacity C is defined as 



V(W) = lim lira sup n • 



C — R(n,p e ) 

Q- l {Pe) 



(34) 



where R(n,p e ) is the highest achievable rate for codeword error probability p e and codeword length n. 
In 1962 , Strassen [4| used the Gaussian approximation to derive the following result for DMCy 



R(n,p e ) = C- y/VfiQ^lpe) + O 0^) , 



(35) 

where C is the channel capacity, and the new quantity V is the (information-theoretic) dispersion , which is given by 

V = VAR(i(X;Y)), (36) 

where i(x;y) is the information spectrum, given by 

i(x; y = log (37) 

P x {x)P Y {y) 

and the distribution of X is the capacity-achieving distribution that minimizes V. Strassen's result proves that the dispersion of 

DMCs is equal to VAR(i(X; Y)). This result was recently tightened (and extended to the power-constrained AWGN channel) 

in |5|. It is also known that the channel dispersion and the error exponent are related as follows. For a channel with capacity 

C and dispersion V, the error exponent can be approximated by E(i?) = 2 v\n2 ■ ^ ee 1^1 for details on the early origins of 

this approximation by Shannon. 

1) PBICM dispersion: In order to estimate the finite-block performance of PBICM schemes we extend the dispersion 
definition as follows: 

Definition 5: The PBICM dispersion V mlCM (W) of a channel W with a given mapping \i and PBICM capacity C PBICM (W) 
is defined as 

,mc*«n_ «_ u fC™™(W)-R(n,p e )^ 



V FB "(M/) = lim limsup n ■ )1 ' ~v" K ' , (38) 

where R(n,p e ) is the highest achievable rate for any PBICM scheme with a given n and p e . 

Relying on the relationship between the PBICM scheme and the binary channel W, we can show the following: 
Theorem 5: Let n be a given block length and let p e be a given codeword error probability. The highest achievable rate 

attained using PBICM, R PB1CM (n,p e ) is bounded from above and below by: 



R PB1CM (n, Pe ) > C™ CM (W) - J ^^ Q- 1 (f ) +oQ, (39) 

R PBlCM (n, Pe ) < C™ CM (W)-^^^Q-\ Pe ) + o(^y (40) 

As a result, the PBICM dispersion is given by 

V pmCM (W) ^ L 2 V(W). (41) 

Proof: 
Direct: From the achievability proof of ( f35T > [6, Theorem 45], there must exist an (n,R',p' e — p e /L) binary code for W 
that satisfies 

R'>C(W)- Jy^lQ-\ Pe /L) + o(-). (42) 



n \n 



By Theorem Q] and Corollary [T] it follows that the PBICM scheme based on this code is not greater than Lp' e — p e . The rate 
of the PBICM scheme satisfies 



R = L-R! > L 



V n \n 



(43) 



C*)-V-^« W + ° (n ■ (44) 



WCMmrs J L2V ( W )^-ifPA , nf 1 



1 see Appendix [E] for the big-O notation. 



Converse: Suppose we have a (n,R,p e ) PBICM scheme. According to Corollary Q] the codeword error probability p' e of 
the underlying binary code is not greater than than p e . By Equation d35l ). the rate R' of the underlying binary code is bounded 

by " ' j 

/ log n s 



R'<C(W)-\l^-Q- l (Pe) + ( ) 



V n 



(45) 



Since Q 1 (-) is a decreasing function, the bound loosens by replacing p' e with the higher p e . Therefore the overall rate R is 
bounded by 



R 



L-R' < L 



C(W) 



n \ n 



y n 
PBICM dispersion: Rewriting Equations ( |39l and (l40l >. we get the following: 



^Vfo.) + o( l ^)< C™™(W) - R < J^V 1 (I) + O (i 



-(W) + o ( ^)^(=^,,v^) 



"Mfc 



Taking the limit w.r.t. n yields 



//-iPBICM/'tiA _ D 

L 2 V(V^) < limsupvM n _ u \ ) < \/L 2 V(W) 



Q- l {Pe) 



o 



- l iPe) 



or 



^^^■( ^w ) SI2V < W » W 



By noting that lim e _ y0 + 9 , Y = 1 (see Appendix [AT), we get that 



(46) 
(47) 

(48) 
(49) 

(50) 
(51) 



Pe^O I Q- 1 ^) / Po^O ln(l/p e ) Pe^O ln(l/p e ) 



which leads to the desired result: 

rPBICM 



V™ ltM (ir) = lim limsup n- / C PBICM (^) - -R(n, Pe ) \ = i2v(Tr , 



(52) 



(53) 



Note that the PBICM dispersion result is not as tight as the bound for general coding schemes as in (1351 1. The reason is the 
unavoidable use of the union bound when estimating the overall error probability of PBICM in Theorem Q] In the dispersion 
proof for DMCs, the value of the dispersion is obtained even without taking the limit w.r.t. p e . However, the gap between 
Q ~ 1 (Pe) an d Q 1 (Pe/L) for values of interest is not very large. 

2) The dispersion of W: 

As in the error exponent case, the PBICM dispersion of a channel is related to the dispersion of the binary channel W . We 
now calculate it explicitly from the dispersions of the sub-channels Wj. 

Theorem 6: The dispersion of the channel W is given by 



V(W) = V(W) = E[V(W S )] + VAR [C(W S )] 



yE V (^) 



VAR{C{W S )) 



where VAR(C(Ws)) is the statistical variance of the capacity of W s , i.e. 

VAR(C(W S )) = E[C 2 (W S )} - E 2 [C(W S )}- 



(54) 



(55) 



Proof: Consider the channel W (Definition [2]i with binary input B and outputs (Y,S,D), and recall that 

Pysd\b{v, a, d\b) = W{ Vl s, d\b) = lw(y, s\b ®d) = ^W s (y\b © d). (56) 

We first calculate the dispersion of W, Since S and the channel input B are independent, the information spectrum is given 
by 

.,. v a , PrSB(y,s,b) . P Yl s B (y\s,b)P s ( S )P B (b) 

<^ S) = l0 Z p YS (y, S )P B (b) = l ° S P YS (y,s)P B (b) (57) 

= lo g PY ^f*> b) ±my\s). (58) 

Using this notation, the dispersion of the channel W s is given by 

V(W.) = VAR(i(B;Y\s)\S = s) 

= E\i 2 (B;Y\s)\S = s]-C(W s ) 2 . 

Next, the dispersion of the channel W is given as follows: 

V(W) = VAR(i(B; Y, S)) = VAR(i(B; Y\S)) 

( =' E [VAR[i(B; Y\s)\S = a]] + VAR [E[i(B; Y\S)\S = s)] 



= E[V(W s )] + VAR[C(Ws)} 



rE v (^ 



s=l 



VAR(C(W S )). (59) 



(a) follows from the law of total variance. 

Finally, the dispersion of the channel W is_calculated as follows: 

Let us combine the outputs of the channel W to a single output Z — (Y, S) . We therefore end up with a channel with input 
B and outputs Z and D (see Fig. |5J. Similarly to ( fSTt . we get that the information spectrum is given by 

i(b; z, d) 4 log W^ & ) = i{b . z{d y m 

PzD{z,d)P B {b) 

Following ( f59b . we get that 

V{W)=E[V{W D )} + VAR,[C{W D )\ =i ^ V(W d ) + VAR(C(Wc)), (61) 

d={0,l} 

where VFrf is the channel W with its input XORed with the value d. 

Since only equiprobable inputs are considered, it follows that C(Wq) = C(Wi) = C(W'), and that V(Wo) = V(Wi) = 
V(W). It therefore follows that VAR(C(Wd)) = 0, and consequently, V(W) = V(W), as required. ■ 

Note that since large dispersion means higher backoff from the capacity (see (l35ll). the term VAR(C(Ws)) can be thought 
of as a penalty factor for the dispersion, over the expected dispersion over the channels W s , E[V(Ws)]. This factor grows as 
the capacities of the sub-channels W{ are more spread. 

V. Numerical Results 

In this section we evaluate numerically the information-theoretical quantities for PBICM. In particular, we calculate the 
PBICM random coding error exponent (see Theorems [5] and @]l in order to compare with the mismatched decoding approach 
J8). We consider the AWGN channel and the Rayleigh fading channel (with perfect channel state information at the receiver) 
over a wide range of SNR values and constellations. Gray mapping was used throughout all the examples. 

A. Normalization: latency vs. complexity 

One way to compare the PBICM error exponent with the mismatched decoding exponent is to compare the error probability 
when the block length n is fixed, which gives a simple comparison between the exponent values. Such an approach makes 
sense, since both schemes have the same latency of n channel uses. As will be seen in the coming examples, for fixed n the 
PBICM error exponent is inferior to that of the mismatched decoding. This can also be seen by observing that the PBICM 
random coding exponent has a slope of — 1/X (in its straight-line region), where the mismatched decoding exponent has a 
slope of —1. 



However, it should be taken into consideration that when the block length is n, the mismatched decoder is working with a 
binary code of length n ■ L. The complexity of the maximum-metric decoder is proportional to the number of codewords 2 nLR 
(8), where R is the rate of the binary code. On the other hand, the number of codewords in the PBICM scheme is L ■ 2 nR 
only. In order to assure a fair comparison from the complexity point of view, one has to allow the PBICM scheme to use a 
block length that is L time the block length of the mismatched decoding scheme. Comparing the error probabilities of both 
schemes gives nLE™ ICM = nE^ lsmatched . We therefore define the normalized PBICM error exponent as L times the PBICM 
error exponent. We conclude that when the complexity is more important (and the latency is less important), the normalized 
PBICM exponent is the quantity of interest. 

It could be claimed, of course, that practical codes used today (such as low-density parity check (LDPC) codes ) will be 
used and they do not have exponential decoding complexity. On the other hand, such codes do not guarantee an exponentially 
decaying error probability. 

B. Comparison with the Mismatched Decoding Exponent 

In the following figures we show the comparison between the PBICM error exponent and the mismatched decoding error 
exponent [8|. The figures show the (unconstrained) random coding error exponent of the channel, along with the mismatched 
error exponent and the PBICM random coding error exponent (both normalized and un-normalized). 

Figure [TO] compares the exponents of 16QAM signaling over the Rayleigh fading channel at SNR = 5dB. Figure fTTI shows 
the same graph, zoomed-in on the capacity region. It can be seen that throughout the entire range of rates between zero and the 
BICM capacity, the normalized PBICM random coding exponent is higher (better) than the mismatched decoding exponent. 
Both BICM exponents are above zero for rates below the BICM capacity, and the unconstrained random coding exponent 
reaches zero at the full channel capacity, as expected. A fact that might be somewhat surprising at first glance is that the 
normalized PBICM exponent is better than the unconstrained random coding exponent in some rates. While this may seem 
contradictory, recall that we consider coding schemes with the same maximum-likelihood (or maximum metric) complexity. 
When normalizing the schemes complexity, PBICM operates with a block length that is L times the block length of the 
unconstrained scheme, and therefore there is no contradiction. The mismatched decoder never attains higher values than the 
unconstrained exponent, a fact that is known as the data processing inequality for exponents (see e.g. [8, Proposition 3.2]). 

Figure [12] shows a similar picture (zoomed on the capacity in Figure [T3l . Again, the normalized PBICM outperforms the 
mismatched decoding exponent for all rates. In this case, the BICM capacity is very close to the full channel capacity, which 
enables the normalized PBICM to outperform the unconstrained exponent for essentially all rates. 

On the Rayleigh fading channel, the same behavior was observed for the range of all practical ranges of SNR for 8PSK, 
16QAM and 64QAM signaling: the normalized PBICM exponent outperformed the mismatched decoding exponent. 

On the AWGN channel it cannot be claimed that the normalized PBICM exponent outperforms the mismatched exponent, 
and the other way around is also not true: for 16QAM signaling and a SNR of OdB (Fig. fT~4l > the normalized PBICM exponent 
was better, while for a SNR of 5dB the mismatched exponent was better (Fig. ITSb . 

VI. Discussion 

In this paper we have presented parallel bit-interleaved coded modulation (PBICM). The scheme is based on a finite-length 
interleaver and adding binary dither to the binary codewords. The scheme is shown to be equivalent to a binary memoryless 
channel, therefore the scheme allows easy code design and exact analysis. The scheme was analyzed from an information- 
theoretical viewpoint, and the capacity, error exponent and the dispersion of the PBICM scheme were calculated. 

Another approach for analyzing BICM at finite block length was proposed in [8|, where BICM is thought of as a mismatched 
decoder. Since this BICM setting uses finite length, the random coding error exponent of the scheme can be calculated. In 
the previous section we have compared the error exponents of PBICM and of the mismatched decoding approach. When the 
two schemes have the same latency (same block length) the PBICM exponent is inferior to that of the mismatched decoding 
approach. However, when the complexity of the scheme is considered (or equivalently, when codeword length of the underlying 
code is the same), PBICM becomes comparable, and generally better over the Rayleigh fading channel. 

An important merit of the PBICM scheme is that it allows an easy code design. In PBICM, one has to design a binary code 
for a memoryless binary channel. In recent years there have developed methods to design very efficient binary codes, such as 
LDPC codes [ 10]. When designing LDPC codes, A desired property of a binary channel is that its output will be symmetric. 
It appears that no matter what channel W we have at hand, the resulting binary channel W is always output-symmetric (when 
the output is the LLR). 

Because of its simplicity and easy code design, we conclude that PBICM is an attractive practical communication scheme, 
which also allows exact theoretical analysis. 
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Fig. 10. Random coding exponents over the Rayleigh fading channel with 16-QAM signaling and SNR of 5dB. 
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Fig. 11. Random coding exponents over the Rayleigh fading channel with 16-QAM signaling and SNR of 5dB (zoomed on the capacity) 
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Fig. 12. Random coding exponents over the Rayleigh fading channel with 64-QAM signaling and SNR of 20dB. 
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Fig. 13. Random coding exponents over the Rayleigh fading channel with 64-QAM signaling and SNR of 20dB (zoomed on the capacity) 
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Fig. 14. Random coding exponents over the AWGN channel with 16QAM signaling and SNR of OdB 
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Fig. 15. Random coding exponents over the AWGN channel with 16QAM signaling and SNR of 5dB 



Several additional notes can be made: 

• The analysis holds for any mapping /i. Finding the mapping that yields the optimal performance at finite lengths is an 
open question (although Gray mapping is expected to perform well). 

• PBICM scheme is composed of, among other things, binary dither. Such tool is used in some cases as a theoretical tool 
for proving achievability in some problems. In PBICM, it is an essential part of the scheme itself, and even the random 
capacity proof becomes impossible without it. The main role of the dither is to validate the equivalence of the PBICM 
scheme to a binary memoryless channel. In addition, the binary dither is the element that symmetrizes the binary channel, 
which makes the code design easier. This symmetrization property was also noticed by Hill where a similar dither is 
used with BICM (and termed 'channel adapters'). The code design proposed in [11 1 rely on the assumption of an ideal 
interleaver. 

• The channel is assumed to be memoryless. This captures many interesting channels, including the AWGN channel, and 
the memoryless fading channel with and without state known at the receiver (ergodic fading). For slow-fading channels, 
another interleaver (symbol interleaver) is required in order to transform the slowly fading channel into a fast-fading 
channel (cf. [0). 

Appendix A 
Approximation of the inverse Q-function 
The following is a useful approximation for the inverse Q-function. 
Lemma 2: 



lim 

e^o [ 21n- 

Proof: 
We start with the well known bound on the Q function: 



= 1. (62) 



'2-kx \ x J " y/2irx 

Dividing by the upper bound yields 



1 + 3 )<r'* <Q(x) < -^-e'^ (63) 



.■2 



< —^-^-r < 1. (64) 



Taking the limit x — > oo gives 



lim Q{X) 2 = 1. (65) 



;'■ -^ OO 



Since the limit exists, we may take the natural logarithm: 



Inn in Q(X) 2 =0. (66) 



x— foo 



/2lTX 



1 _h£ 

lim lnQ(x) — In lne = = 0. (67) 



Since lim. r _j. 00 lnQ(x) = — oo, we get 



which leads to 



2ttx 



In Q(x) — In —X In e ^ 

lim — — ^ = o, (68) 



X— >QO 



lnQ( 



x->oo ln<9(a;) x^°° 2hiQ(x 
Since lim e _j.o Q^ 1 (e) = oo, we may substitute x with Q^ 1 (e), and write 



l — — 2 

m p 2 T* 

lim - — = lim — — = 1, (69) 



>o 2 lne 
which leads to 



lim -"?-'('»' - 1, (70) 



Appendix B 
Big-O notation: 

As usual, f(n) — 0(e n ) means that there exist c > and no > s.t. for all n > no, \f(n)\ < e n or equivalently, that 

-ce n < f(n) < ce n . (71) 

fn = 9n + 0(e n ) will mean that /„ — g n — 0(e n ), which means that /„ can be approximated by g n , up to a factor that is 
not greater in absolute value than c • e n for some constant c. 

Sometimes we will be interested in only one of the sides in (1711 . For that purpose, f(n) < 0(e n ) means that there exist 
c > and no > s.t. for all n > no, f(n) < c ■ e n , and f(n) > 0(e n ) will mean that there exist c > and no > s.t. for 
all n > no, — /(n) < c ■ e n . 

The different combinations of usages of the O notation are listed in the table below. 



Notation 


Meaning 


/„ = 0(e n ) 


3 c >0,no>0Vn>no \fn\ < C • £„ 




fn = 9n + 0{s n ) 


fn - 9n = 0(e n ) 




U < 0(e n ) 


3c>0,no>0V n >no /n^ c ' e n 




fn < 9n + 0(e n ) 


fn - 9n <0(e n ) 




fn > 0{e n ) 


—fn < 0(e n ), or 3 c> o : n >oV„> rio — /„ < c 


£n 


fn> g n + 0(s n ) 


fn - g n > 0{e n ) 





Note that /„ < 0(e n ) with /„ > 0(e n ) is equivalent to /„ = 0(e n ), as expected. 
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